Of the 1000 films in the database, the average film costs about $19.98 to replace and the individual costs range from $9.99 to $29.99. To get an intuition of why films might be more expensive to replace, it can be helpful to explore the table (Figure 1) below where films are sorted by their replacement cost. It is difficult to learn much this way but as will be shown, there may not even be a definitive connection between a film’s characteristics and its replacement cost.
Note: Language ID and Release Year each only have a single unique value across all of the data.
Figure 1
The histogram of replacement costs (Figure 2) reveals the lack of a clear pattern among the replacement costs. To further explore what drives replacement cost, we can look at its relationship to other variables.
Figure 2
Figure 3 is evidence that there is not a linear relationship between the numerical characteristics of a film (rental duration, rental rate, and length), and its replacement cost. The variable with the correlation of the largest magnitude with replacement cost is rental rate at -0.0446. This is a very weak correlation and an additional reason why it would be warranted to explore the relationship between replacement cost and the categorical characteristics of films.
Figure 3
Figures 4 and 5 are box plots depicting the distribution of replacement cost relative to the unique values of the two categorical characteristics of films in our data: film rating and special features. Curiously, both the minimum and maximum replacement cost associated with each value for both of these characteristics are almost identical. Combining this with the similarities in interquartile range of replacement cost across the characteristics, it suggests that the categorical valuables are also not clearly connected to the replacement cost.
Note: For Figure 4, hover your cursor over individual box plots to better see the special feature groupings that they correspond to.
Figure 4
Figure 5
Conclusion
After this preliminary analysis, there are no easily identifiable connections between a film’s characteristics and its replacement cost. For future analysis it might be helpful to model non-linear relationships and interactions between variables as they relate to replacement cost. This could be done by fitting a tree-based model and interpreting the feature importances.
Figure 6
Figure 7